-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Optimize the codegen for Span::from_expansion
#140485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
Is this function this hot that it needs micro-optimized? Have you benchmark it on real world case? |
There are more than 300 call sites (most from clippy) and it ends up being called more than once per expression while linting (in clippy). It's definitely used a lot and frequently called. |
e332f70
to
761d0ec
Compare
I wonder how to write this better so it doesn't look like magic. |
Perhaps like this: let ctxt = match_span_kind! {
self,
// All branches here, except `InlineParent`, actually return `span.ctxt_or_parent_or_marker`,
// this makes the code optimize very well by eliminating the `ctxt_or_parent_or_marker` comparison.
InlineCtxt(span) => SyntaxContext::from_u16(span.ctxt),
InlineParent(_span) => SyntaxContext::root(),
PartiallyInterned(span) => SyntaxContext::from_u16(span.ctxt),
Interned(_span) => SyntaxContext::from_u16(CTXT_INTERNED_MARKER),
}; |
In any case, let's benchmark this on rustc too. |
This comment has been minimized.
This comment has been minimized.
Optimize the codegen for `Span::from_expansion` See https://godbolt.org/z/bq65Y6bc4 for the difference. the new version is less than half the number of instructions. Also tried fully writing the function by hand: ```rust sp.ctxt_or_parent_or_marker != 0 && ( sp.len_with_tag_or_marker == BASE_LEN_INTERNED_MARKER || sp.len_with_tag_or_marker & PARENT_TAG == 0 ) ``` But that was no better than this PR's current use of `match_span_kind`.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (fb65274): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary 0.7%, secondary 1.3%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -0.7%, secondary -2.5%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -1.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 770.233s -> 770.093s (-0.02%) |
761d0ec
to
d9c060b
Compare
Went with a more verbose explanation of why this version optimizes better and why it gets the right result. Also that's a little more of a benefit than I was expecting from just rustc. |
The |
@bors r+ rollup=maybe |
…chenkov Optimize the codegen for `Span::from_expansion` See https://godbolt.org/z/bq65Y6bc4 for the difference. the new version is less than half the number of instructions. Also tried fully writing the function by hand: ```rust sp.ctxt_or_parent_or_marker != 0 && ( sp.len_with_tag_or_marker == BASE_LEN_INTERNED_MARKER || sp.len_with_tag_or_marker & PARENT_TAG == 0 ) ``` But that was no better than this PR's current use of `match_span_kind`.
…iaskrgr Rollup of 7 pull requests Successful merges: - rust-lang#134034 (handle paren in macro expand for let-init-else expr) - rust-lang#139186 (Refactor `diy_float`) - rust-lang#140062 (std: mention `remove_dir_all` can emit `DirectoryNotEmpty` when concurrently written into) - rust-lang#140430 (Improve test coverage of HIR pretty printing.) - rust-lang#140485 (Optimize the codegen for `Span::from_expansion`) - rust-lang#140505 (linker: Quote symbol names in .def files) - rust-lang#140521 (interpret: better error message for out-of-bounds pointer arithmetic and accesses) r? `@ghost` `@rustbot` modify labels: rollup
…chenkov Optimize the codegen for `Span::from_expansion` See https://godbolt.org/z/bq65Y6bc4 for the difference. the new version is less than half the number of instructions. Also tried fully writing the function by hand: ```rust sp.ctxt_or_parent_or_marker != 0 && ( sp.len_with_tag_or_marker == BASE_LEN_INTERNED_MARKER || sp.len_with_tag_or_marker & PARENT_TAG == 0 ) ``` But that was no better than this PR's current use of `match_span_kind`.
Rollup of 11 pull requests Successful merges: - rust-lang#134034 (handle paren in macro expand for let-init-else expr) - rust-lang#138703 (chore: remove redundant words in comment) - rust-lang#139186 (Refactor `diy_float`) - rust-lang#139343 (Change signature of File::try_lock and File::try_lock_shared) - rust-lang#139780 (docs: Add example to `Iterator::take` with `by_ref`) - rust-lang#139802 (Fix some grammar errors and hyperlinks in doc for `trait Allocator`) - rust-lang#140034 (simd_select_bitmask: the 'padding' bits in the mask are just ignored) - rust-lang#140062 (std: mention `remove_dir_all` can emit `DirectoryNotEmpty` when concurrently written into) - rust-lang#140485 (Optimize the codegen for `Span::from_expansion`) - rust-lang#140505 (linker: Quote symbol names in .def files) - rust-lang#140521 (interpret: better error message for out-of-bounds pointer arithmetic and accesses) r? `@ghost` `@rustbot` modify labels: rollup
…chenkov Optimize the codegen for `Span::from_expansion` See https://godbolt.org/z/bq65Y6bc4 for the difference. the new version is less than half the number of instructions. Also tried fully writing the function by hand: ```rust sp.ctxt_or_parent_or_marker != 0 && ( sp.len_with_tag_or_marker == BASE_LEN_INTERNED_MARKER || sp.len_with_tag_or_marker & PARENT_TAG == 0 ) ``` But that was no better than this PR's current use of `match_span_kind`.
Rollup of 10 pull requests Successful merges: - rust-lang#134034 (handle paren in macro expand for let-init-else expr) - rust-lang#138703 (chore: remove redundant words in comment) - rust-lang#139186 (Refactor `diy_float`) - rust-lang#139343 (Change signature of File::try_lock and File::try_lock_shared) - rust-lang#139780 (docs: Add example to `Iterator::take` with `by_ref`) - rust-lang#139802 (Fix some grammar errors and hyperlinks in doc for `trait Allocator`) - rust-lang#140034 (simd_select_bitmask: the 'padding' bits in the mask are just ignored) - rust-lang#140062 (std: mention `remove_dir_all` can emit `DirectoryNotEmpty` when concurrently written into) - rust-lang#140485 (Optimize the codegen for `Span::from_expansion`) - rust-lang#140521 (interpret: better error message for out-of-bounds pointer arithmetic and accesses) r? `@ghost` `@rustbot` modify labels: rollup
…iaskrgr Rollup of 9 pull requests Successful merges: - rust-lang#140485 (Optimize the codegen for `Span::from_expansion`) - rust-lang#140509 (transmutability: merge contiguous runs with a common destination) - rust-lang#140519 (Use select in projection lookup in `report_projection_error`) - rust-lang#140521 (interpret: better error message for out-of-bounds pointer arithmetic and accesses) - rust-lang#140536 (Rename `*Guard::try_map` to `filter_map`.) - rust-lang#140550 (Stabilize `select_unpredictable`) - rust-lang#140563 (extend the list of registered dylibs on `test::prepare_cargo_test`) - rust-lang#140572 (Add useful comments on `ExprKind::If` variants.) - rust-lang#140574 (Add regression test for 133065) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of rust-lang#140485 - Jarcho:from_expansion_opt, r=petrochenkov Optimize the codegen for `Span::from_expansion` See https://godbolt.org/z/bq65Y6bc4 for the difference. the new version is less than half the number of instructions. Also tried fully writing the function by hand: ```rust sp.ctxt_or_parent_or_marker != 0 && ( sp.len_with_tag_or_marker == BASE_LEN_INTERNED_MARKER || sp.len_with_tag_or_marker & PARENT_TAG == 0 ) ``` But that was no better than this PR's current use of `match_span_kind`.
See https://godbolt.org/z/bq65Y6bc4 for the difference. the new version is less than half the number of instructions.
Also tried fully writing the function by hand:
But that was no better than this PR's current use of
match_span_kind
.